Integrated Techniques for Phrase Extraction from Speech

نویسندگان

  • Marie Meteer
  • Jan Robin Rohlicek
چکیده

We present an integrated approach to speech and natural language processing which uses a single parser to create training for a statistical speech recognition component and for interpreting recognized text. On the speech recognition side, our innovation is the use of a statistical model combining N-gram and context-free grammars. On the natural language side, our innovation is the integration of parsing and semantic interpretation to build references for only targeted phrase types. In both components, a semantic grammar and partial parsing facilitate robust processing of the targeted portions of a domain. This integrated approach introduces as much linguistic structure and prior statistical information as is available while maintaining a robust full-coverage statistical language model for recognition. In addition, our approach facilitates both the direct detection of linguistic constituents within the speech recognition algorithms and the creation of semantic interpretations of the recognized phrases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Ontology-Based Approach for Key Phrase Extraction

Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specif...

متن کامل

Multilingual Finite-state Noun Phrase Extraction

Extraction Anne Schiller Rank Xerox Research Centre 38240 Meylan, France [email protected] Abstract. The paper describes a tool for noun phrase mark-up based on nite-state techniques and statistical part-of-speech disambiguation. We illustrate the proceeding by examples from realizations for seven languages (Dutch, English, French, German, Italian, Portuguese, and Spanish).

متن کامل

Keyphrase Extraction using Sequential Labeling

Keyphrases efficiently summarize a document’s content and are used in various document processing and retrieval tasks. Several unsupervised techniques and classifiers exist for extracting keyphrases from text documents. Most of these methods operate at a phrase-level and rely on part-of-speech (POS) filters for candidate phrase generation. In addition, they do not directly handle keyphrases of ...

متن کامل

Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction

In the state-of-the-art speech synthesis system, prosodic phrase prediction is the most serious problem which leads to about 40% of text analysis errors. Two optimization strategies are proposed in this paper to deal with two major types of prosodic phrase prediction errors. First, unsupervised adaptation method is proposed to alleviate the mismatching problem between training and testing. Seco...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994